5 research outputs found

    A Sweet Recipe for Consolidated Vulnerabilities: Attacking a Live Website by Harnessing a Killer Combination of Vulnerabilities

    Full text link
    The recent emergence of new vulnerabilities is an epoch-making problem in the complex world of website security. Most of the websites are failing to keep updating to tackle their websites from these new vulnerabilities leaving without realizing the weakness of the websites. As a result, when cyber-criminals scour such vulnerable old version websites, the scanner will represent a set of vulnerabilities. Once found, these vulnerabilities are then exploited to steal data, distribute malicious content, or inject defacement and spam content into the vulnerable websites. Furthermore, a combination of different vulnerabilities is able to cause more damages than anticipation. Therefore, in this paper, we endeavor to find connections among various vulnerabilities such as cross-site scripting, local file inclusion, remote file inclusion, buffer overflow CSRF, etc. To do so, we develop a Finite State Machine (FSM) attacking model, which analyzes a set of vulnerabilities towards the road to finding connections. We demonstrate the efficacy of our model by applying it to the set of vulnerabilities found on two live websites.Comment: Accepted at 5th International Conference on Networking, Systems and Security (5th NSysS 2018

    bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents

    Full text link
    Despite the existence of numerous Optical Character Recognition (OCR) tools, the lack of comprehensive open-source systems hampers the progress of document digitization in various low-resource languages, including Bengali. Low-resource languages, especially those with an alphasyllabary writing system, suffer from the lack of large-scale datasets for various document OCR components such as word-level OCR, document layout extraction, and distortion correction; which are available as individual modules in high-resource languages. In this paper, we introduce Bengali..AI-BRACU-OCR (bbOCR): an open-source scalable document OCR system that can reconstruct Bengali documents into a structured searchable digitized format that leverages a novel Bengali text recognition model and two novel synthetic datasets. We present extensive component-level and system-level evaluation: both use a novel diversified evaluation dataset and comprehensive evaluation metrics. Our extensive evaluation suggests that our proposed solution is preferable over the current state-of-the-art Bengali OCR systems. The source codes and datasets are available here: https://bengaliai.github.io/bbocr

    BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset

    Full text link
    While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain variations and out-of-distribution layouts. To this end, we present the first multidomain large Bengali Document Layout Analysis Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples from six domains - i) books and magazines, ii) public domain govt. documents, iii) liberation war documents, iv) newspapers, v) historical newspapers, and vi) property deeds, with 710K polygon annotations for four unit types: text-box, paragraph, image, and table. Through preliminary experiments benchmarking the performance of existing state-of-the-art deep learning architectures for English DLA, we demonstrate the efficacy of our dataset in training deep learning based Bengali document digitization models

    OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking

    Full text link
    We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, Bengali portrays large diversity in dialects and prosodic features, which demands ASR frameworks to be robust towards distribution shifts. For example, islamic religious sermons in Bengali are delivered with a tonality that is significantly different from regular speech. Our training dataset is collected via massively online crowdsourcing campaigns which resulted in 1177.94 hours collected and curated from 22,64522,645 native Bengali speakers from South Asia. Our test dataset comprises 23.03 hours of speech collected and manually annotated from 17 different sources, e.g., Bengali TV drama, Audiobook, Talk show, Online class, and Islamic sermons to name a few. OOD-Speech is jointly the largest publicly available speech dataset, as well as the first out-of-distribution ASR benchmarking dataset for Bengali
    corecore